Previously I worked on creating a graph from a table object. It works well enough, although I need to figure out how I can represent parallel tasks effectively.

Now I’m thinking about how I can start getting some more complex plotting out of this.

Looking up GraphViz I found that there is some support for ‘clusters’. At this point I’m wondering how much I’m going to be using R, compared to using this GraphViz platform.

I generated some practice process data, since my work probably doesn’t want me showing the world their process.

ms <- read.csv("mealslurry.csv")

So, to turn this into a graph, I can make some assumptions about at least the order of the top level process and the ‘process’, in that they are all in order.

Working with the data

Taking a stab at data.tree

library(data.tree)

So the data.tree package uses ‘path to string’ methods for data import. Let’s try that

ms$pathString <- paste("mealslurry",
                       ms$top.level.process,
                       ms$process,
                       ms$task.definition,
                       sep = "/")

ms.dt <- as.Node(ms)
print(ms.dt)
##                                         levelName
## 1  mealslurry                                    
## 2   ¦--acquire dry goods                         
## 3   ¦   ¦--online shopping                       
## 4   ¦   ¦   ¦--buy crickets                      
## 5   ¦   ¦   ¦--buy curry powder                  
## 6   ¦   ¦   ¦--buy whey powder                   
## 7   ¦   ¦   ¦--citric acid                       
## 8   ¦   ¦   ¦--crushed red pepper                
## 9   ¦   ¦   °--black pepper                      
## 10  ¦   °--grocery store                         
## 11  ¦       ¦--salt                              
## 12  ¦       ¦--garlic powder                     
## 13  ¦       °--dried diced onions                
## 14  ¦--acquire grocery goods                     
## 15  ¦   °--grocery store                         
## 16  ¦       ¦--buy butter                        
## 17  ¦       ¦--buy frozen vegetables             
## 18  ¦       °--buy yogurt                        
## 19  ¦--cooking                                   
## 20  ¦   ¦--lentils                               
## 21  ¦   ¦   ¦--soak lentils                      
## 22  ¦   ¦   ¦--cook lentils                      
## 23  ¦   ¦   ¦--cool lentils                      
## 24  ¦   ¦   ¦--inoculate with yogurt             
## 25  ¦   ¦   ¦--blend                             
## 26  ¦   ¦   °--let stand 24hr                    
## 27  ¦   ¦--cook                                  
## 28  ¦   ¦   ¦--heat lentils                      
## 29  ¦   ¦   ¦--add spices to lentils             
## 30  ¦   ¦   ¦--combine butter and vegetables     
## 31  ¦   ¦   °--add garlic and onion to vegetables
## 32  ¦   °--blend                                 
## 33  ¦       °--blend lentils                     
## 34  ¦--portion                                   
## 35  ¦   ¦--prep jars                             
## 36  ¦   ¦   ¦--collect jars                      
## 37  ¦   ¦   ¦--open                              
## 38  ¦   ¦   °--get funnel and spoon              
## 39  ¦   °--load jars                             
## 40  ¦       ¦--3 scoops per jar lentils          
## 41  ¦       ¦--add vegetables to lentil          
## 42  ¦       ¦--sitr                              
## 43  ¦       ¦--3 scoops per jar + remaining      
## 44  ¦       °--tighten lids on jars              
## 45  °--storage                                   
## 46      ¦--cool                                  
## 47      ¦   °--cool for a couple hours           
## 48      °--fridge                                
## 49          °--place in fridge

That’s all well and good, but I think I’m losing the aspects of how my data really are a directed acyclic graph.

Going back to DiagrammeR

Since I have three levels, I want to experiment with creating smaller graphs, and then linking them together.

working with dry goods

dg <- ms %>% filter(top.level.process == "acquire dry goods")

So, for dry goods, there’s no real process to the tasks, just that I do all of them at once (either online or in a grocery store)

So I’m thinking that it would be two nodes, online to grocery store, with a bunch of leafs off each one.

although it would probably be better to work with the cooking part of the graph

cook <- ms %>% filter(top.level.process == "cooking")
# for some reason this is throwing an error.
# if the cols are factors, it throws an error


cook_gf <- create_graph()

cook_gf %>% 
add_nodes_from_df_cols( df = cook, columns = c("process", "task.definition")) %>% render_graph()

well this doesn’t work that well

create_node_df(n = length(cook$task.definition),
               type = unique(cook$top.level.process),
               label = cook$task.definition,
               process = cook$process)
##    id    type                              label process
## 1   1 cooking                       soak lentils lentils
## 2   2 cooking                       cook lentils lentils
## 3   3 cooking                       cool lentils lentils
## 4   4 cooking              inoculate with yogurt lentils
## 5   5 cooking                              blend lentils
## 6   6 cooking                     let stand 24hr lentils
## 7   7 cooking                       heat lentils    cook
## 8   8 cooking              add spices to lentils    cook
## 9   9 cooking      combine butter and vegetables    cook
## 10 10 cooking add garlic and onion to vegetables    cook
## 11 11 cooking                      blend lentils   blend

hmm, maybe I should be breaking it up into process

cook.ls <- split(x = cook, f = cook$process)

So, for each process, I want to make some assumptions that there are no parallel steps (I feel like that’s a poor assumption even as I write this). But this could make it easier to create individual DAGs, and then somehow cluster them together when the time comes (for this specific example the lentils and veg can be done in parallel … too bad my data doesn’t represent that :( )

Working with DAGs

For each list element, make a DAG

let’s sketch out what I would do:

x <- cook.ls$lentils

# create node df

nodes_x <- create_node_df(n = length(x$top.level.process), label = x$task.definition, type = x$process,
               top_level_process = x$top.level.process)
# create edge df
edges_x <- create_edge_df(from = 1:(length(x$top.level.process) -1), to = 2:(length(x$top.level.process)))
# create graph
create_graph(nodes_df = nodes_x, edges_df = edges_x) %>% render_graph

neat, let’s make a function out of it

MakeProcessDAG <- function(x){
  # x is a dataframe containing the steps of one process
  # expected cols are:
  # top.level.process = the top level process from the RACI table
  # process = the single process captured by this table
  # task.definition = the tasks involved in the process
  # assuming tasks are ordered sequentially
  
  n_tasks <- length(x$task.definition)
  
  # create node_df
  nodes_x <- create_node_df(n = n_tasks,
                            label = x$task.definition,
                            type = x$process,
                            top_level_process = x$top.level.process,
                            shape = "plaintext") # this doesn't seem to work :(
  # create edge df
  if(n_tasks == 1){
    # catching if there is only one task in a process
    graph_x <- create_graph(nodes_x)
    
  }else{
    edges_x <- create_edge_df(from = seq_len(n_tasks -1),
                              to = 2:(n_tasks))
    # create graph
    graph_x <- create_graph(nodes_df = nodes_x, edges_df = edges_x)
  }
  
  return(graph_x)
}

Results

lentils
MakeProcessDAG(cook.ls$lentils) %>% render_graph()
cook
MakeProcessDAG(cook.ls$cook) %>% render_graph()
blend
MakeProcessDAG(cook.ls$blend) %>% render_graph()

Using the function

# switching style to not using . in names, #whatever
cookgh_ls <- lapply(cook.ls, MakeProcessDAG)

Now how to connect them together …

OK, I have all the individual processes turned into DAGs, but now how can I connect them together?

Looks like the github.io docs actually has some content that is meaningful

combine_graphs

combine_graphs(cookgh_ls$lentils, cookgh_ls$cook) %>% 
  render_graph

understandably this doesn’t connect the two graphs

combine_graphs(cookgh_ls$lentils, cookgh_ls$cook) %>% 
  get_node_df()
##    id    type                              label top_level_process
## 1   1 lentils                       soak lentils           cooking
## 2   2 lentils                       cook lentils           cooking
## 3   3 lentils                       cool lentils           cooking
## 4   4 lentils              inoculate with yogurt           cooking
## 5   5 lentils                              blend           cooking
## 6   6 lentils                     let stand 24hr           cooking
## 7   7    cook                       heat lentils           cooking
## 8   8    cook              add spices to lentils           cooking
## 9   9    cook      combine butter and vegetables           cooking
## 10 10    cook add garlic and onion to vegetables           cooking
##        shape
## 1  plaintext
## 2  plaintext
## 3  plaintext
## 4  plaintext
## 5  plaintext
## 6  plaintext
## 7  plaintext
## 8  plaintext
## 9  plaintext
## 10 plaintext

…. so I’m realzing that I could probably just create a graph for the entire cooking process

n_cook_tasks <- length(cook$task.definition)

cook_nodes <- create_node_df(n = n_cook_tasks,
                             label = cook$task.definition, 
                             type = cook$process,
                             top_level_process = cook$top.level.process)
cook_edges <- create_edge_df(from = seq_len(n_cook_tasks - 1), 
                             to = 2:n_cook_tasks)

cook_graph <- create_graph(nodes_df = cook_nodes, edges_df = cook_edges)

render_graph(cook_graph, layout = "circle")

Now, how do I get some kind of boxes around the sections that are the different process?

playing around with the render_graph function, turns out visNetwork can give you some nice functionality, and it seems to color based on type without me specifying it. So that’s something. … also, it lays out differently each time … strange :/

render_graph(cook_graph, output = "visNetwork", layout = "circle")